-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log model properties #677
Log model properties #677
Conversation
@@ -56,6 +56,10 @@ class OPENVINO_GENAI_EXPORTS ContinuousBatchingPipeline { | |||
|
|||
PipelineMetrics get_metrics() const; | |||
|
|||
std::vector<std::string> get_model_configuration(); | |||
|
|||
void print_model_configuration(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that instead of adding debug API to public API, we can either:
- Enable compiled time cmake option to print some debug information.
- Or do it via environment variable. In this case we need to ensure that debug info handling does not affect performance.
It can be useful for timers which are currently profile each iteration / step. I vote for cmake option with debug information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved the functionality to utils as we want to have it every time we use cb_pipeline.
951d09e
to
9a1b7a9
Compare
@@ -61,6 +61,8 @@ class OPENVINO_GENAI_EXPORTS ContinuousBatchingPipeline { | |||
GenerationHandle add_request(uint64_t request_id, const ov::Tensor& input_ids, const ov::genai::GenerationConfig& sampling_params); | |||
GenerationHandle add_request(uint64_t request_id, const std::string& prompt, const ov::genai::GenerationConfig& sampling_params); | |||
|
|||
std::string get_model_configuration_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm against of adding API method for debug purposes.
Can we enable it via env var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Env var would be problematic. Maybe we can add a flag in the constructor that is set to false by default ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it problematic? once model is compiled, we can dump this information based on env var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we do not handle ENV variables in OVMS runtime, this is extra capability reserved only for CLOUD services for example secrets or endpoints. We can add a graph llm options parameter, then we can pass it to the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should store loaded model configuration and provide getter for it.
In current shape it indeed looks like a debug method as the string it returns is not very useful except for printing.
I would change it to return a map with properties represented as {key, value}. This way user would be able to conveniently check model configuration and possibly make decisions in their own applications based on value of certain properties. It might also be useful for UX and monitoring purposes.
The full_log
flag in pipeline constructor is unnecessary in my opinion. I think we should store configuration either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider this method as informative so not only for debugging purposes. The output of the method in a form of vector of strings can be used for various purposes on the client application side just like such API is in OV Runtime.
Introducing extra env variable OV_CB_FULL_LOG to mute the output of the method is in my opinion totally useless. I suggest to keep the same output from get_model_configuration(). It will be the client application to use it or not.
It will be used in OVMS to track the configuration of the model for monitoring and benchmarking purposes.
05580ef
to
608b361
Compare
608b361
to
7eb692e
Compare
7eb692e
to
f39478f
Compare
Adding possibility to get and log compiled model properties.
Very useful for benchmarking and comparing settings on different hardware.
Example output:
Loaded model configuration:
AFFINITY: CORE
CPU_DENORMALS_OPTIMIZATION: NO
CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1
DYNAMIC_QUANTIZATION_GROUP_SIZE: 32
ENABLE_CPU_PINNING: YES
ENABLE_HYPER_THREADING: NO
EXECUTION_DEVICES: CPU
EXECUTION_MODE_HINT: PERFORMANCE
INFERENCE_NUM_THREADS: 24
INFERENCE_PRECISION_HINT: f32
KV_CACHE_PRECISION: f16
LOG_LEVEL: LOG_NONE
MODEL_DISTRIBUTION_POLICY:
NETWORK_NAME: Model0
NUM_STREAMS: 1
OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1
PERFORMANCE_HINT: LATENCY
PERFORMANCE_HINT_NUM_REQUESTS: 0
PERF_COUNT: YES
SCHEDULING_CORE_TYPE: ANY_CORE